Analyzing postprandial metabolomics data using multiway models: a simulation study

Background Analysis of time-resolved postprandial metabolomics data can improve the understanding of metabolic mechanisms, potentially revealing biomarkers for early diagnosis of metabolic diseases and advancing precision nutrition and medicine. Postprandial metabolomics measurements at several time points from multiple subjects can be arranged as a subjects by metabolites by time points array. Traditional analysis methods are limited in terms of revealing subject groups, related metabolites, and temporal patterns simultaneously from such three-way data. Results We introduce an unsupervised multiway analysis approach based on the CANDECOMP/PARAFAC (CP) model for improved analysis of postprandial metabolomics data guided by a simulation study. Because of the lack of ground truth in real data, we generate simulated data using a comprehensive human metabolic model. This allows us to assess the performance of CP models in terms of revealing subject groups and underlying metabolic processes. We study three analysis approaches: analysis of fasting-state data using principal component analysis, T0-corrected data (i.e., data corrected by subtracting fasting-state data) using a CP model and full-dynamic (i.e., full postprandial) data using CP. Through extensive simulations, we demonstrate that CP models capture meaningful and stable patterns from simulated meal challenge data, revealing underlying mechanisms and differences between diseased versus healthy groups. Conclusions Our experiments show that it is crucial to analyze both fasting-state and T0-corrected data for understanding metabolic differences among subject groups. Depending on the nature of the subject group structure, the best group separation may be achieved by CP models of T0-corrected or full-dynamic data. This study introduces an improved analysis approach for postprandial metabolomics data while also shedding light on the debate about correcting baseline values in longitudinal data analysis.

Comparison of CP models for real vs.simulated data Figure 4 in the main text shows that patterns in the metabolites and time modes extracted by the 3component CP model for the simulated and real data sets are similar to some extent.This file aims to understand their differences.In all figures presented in this supplemental file, red lines correspond to the time profiles of individual subjects, and the black line is the median time profile of that specific group of subjects.Figure S4.3 and S4.4 present the time profiles of the subjects with large coefficients in the second component (comp2) in the 3-component CP model for the real data set.From these two figures, we can see that the concentration of Pyr in the selected subjects (subjects with large absolute coefficients in comp2 in the 3-component model for the real data), especially subjects with negative large absolute coefficients (i.e., Figure S4.4), possess similar dynamic profiles as Ins and Glc, i.e., dynamic patterns with the highest peak appearing at around 1.5h.However, in the simulated data, Pyr has rather different dynamic profiles compared with Ins and Glc (see Figure S4.5).These observations explain why Pyr has different coefficients in comp2 for the real and simulated data sets.
Figure4in the main text shows that patterns in the metabolites and time modes extracted by the 3component CP model for the simulated and real data sets are similar to some extent.This file aims to understand their differences.In all figures presented in this supplemental file, red lines correspond to the time profiles of individual subjects, and the black line is the median time profile of that specific group of subjects.FigureS4.1 and S4.2 show the time profiles of the subjects with large coefficients in the first component (comp1) in the 3-component CP model for the real data set.These two figures indicate that comp1 (patterns for the real data) mainly captures the subjects with the highest peak appearing around 0.5h in metabolites Pyr, Lac, and Ala.Comparing FigureS4.1 and S4.2 with FigureS4.5,we can see that the time points for the highest peak of Pyr in the real and simulated data are different, leading to the shift of the time point for the highest peak in comp1 for the real vs.simulated data observed in Figure4in the main text.FigureS4.3and S4.4 present the time profiles of the subjects with large coefficients in the second component (comp2) in the 3-component CP model for the real data set.From these two figures, we can see that the concentration of Pyr in the selected subjects (subjects with large absolute coefficients in comp2 in the 3-component model for the real data), especially subjects with negative large absolute coefficients (i.e., FigureS4.4),possess similar dynamic profiles as Ins and Glc, i.e., dynamic patterns with the highest peak appearing at around 1.5h.However, in the simulated data, Pyr has rather different dynamic profiles compared with Ins and Glc (see FigureS4.5).These observations explain why Pyr has different coefficients in comp2 for the real and simulated data sets.

Figure S4. 1 :
Figure S4.1:Time profiles of subjects with large positive coefficients (no less than 0.08) in comp1 from the 3-component model for the preprocessed real data.

Figure S4. 2 :
Figure S4.2:Time profiles of subjects with negative large absolute coefficients (no greater than -0.08) in comp1 from the 3-component model for the preprocessed real data.

Figure S4. 3 :
Figure S4.3:Time profiles of subjects with large positive coefficients (no less than 0.08) in comp2 from the 3-component model for the preprocessed real data.

Figure S4. 4 :
Figure S4.4:Time profiles of subjects with negative large absolute coefficients (no greater than -0.08) in comp2 from the 3-component model for the preprocessed real data.

Figure S4. 5 :
Figure S4.5:Time profiles of each subject for the preprocessed simulated data.